An Inverted Index for Storing and Retrieving Grammatical Dependencies
نویسندگان
چکیده
Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-grained enough. We propose an inverted index over grammatical relations as a fast and reliable resource to access more general and also more detailed frequency information. To build the index, we use a dependency parser to parse a large corpus. We extract binary dependency relations, such as he-subj-say (he is the subject of say) as index terms and construct the index using publicly available open-source indexing software. The unit we index over is the sentence. The index can be used to extract grammatical relations and frequency counts for these relations. The framework also provides the possibility to search for partial dependencies (say, the frequency of he occurring in subject position), words, strings and a combination of these. One possible application is the disambiguation of syntactic structures.
منابع مشابه
Compressing Inverted Index Using Optimal FastPFOR
Indexing plays an important role for storing and retrieving the data in Information Retrieval System (IRS). Inverted Index is the most frequently used indexing structure in IRS. In order to reduce the size of the index and retrieve the data efficiently, compression schemes are used, because the retrieval of compressed data is faster than uncompressed data. High speed compression schemes can imp...
متن کاملText retrieval and the relational model
In this article, Macleod examines the suitability of the relational model in the context of storing and retrieving documents. The relational model is compared with other traditional models and their strengths and weaknesses are compared. Through out the article, Macleod largely compares the relational model against the text model. He states that ‘the terminology text model is used to refer to i...
متن کاملInverted indexes: Types and techniques
There has been a s ubstantial amount of research on high performance inverted index because most web and search engines use an inverted index to execute queries. Documents are normally stored as lists of words, but inverted indexes invert this by storing for each word the list of documents that the word appears in, hence the name “inverted index”. This paper presents the crucial research findin...
متن کاملSearching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
There are many advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. This method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requ...
متن کاملAggregation-Aware Top-k Computation for Full-Text Search
A typical scenario in information retrieval and web search is to index a given type of items (e.g., web pages, images) and provide search functionality for them. In such a scenario, the basic units of indexing and retrieval are the same. Extensive study has been done for efficient top-k computation in such settings. This paper studies top-k processing for many emerging scenarios: efficiently re...
متن کامل